(19) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



01) EP 0 747 816 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

11.12.1996 Bulletin 1996/50 



(51) Intel. 6 : G06F 9/46 



(21) Application number: 96480079.1 

(22) Date of filing: 31.05.1996 



(84) 


Designated Contracting States: 


• Kossman, Harold F. 




DE FR GB 


Rochester, MN 55906 (US) 






• Kunkel, Steven Raymond 


(30) 


Priority; 07.06.1995 US 473692 


Rochester, MN 55901 (US) 






• Mullins, Timothy John 


(71) 


Applicant: INTERNATIONAL BUSINESS 


Rochester, MN 55901 (US) 




MACHINES CORPORATION 


• Rose, James Allen 




Armonk, NY 10504 (US) 


Rochester, MN 55901 (US) 


(72) 


Inventors: 


(74) Representative: Lattard, Nicole 


• 


Eickemeyer, Richard James 


Compagnie IBM France 




Rochester, MN 55901 (US) 


Departement de Propriete Intellectuelle 


• 


Johnson, Ross Evan 


06610 La Gaude (FR) 




Rochester, MN 55906 (US) 





(54) Method and system for high performance multithread operation in a data processing system 



CM 
< 
CO 

CO 

o 

Q. 
ID 



(57) A method and system for enhanced perform- 
ance multithread operation in a data processing system 
which includes a processor, a main memory store and 
at least two levels of cache memory. At least one instruc- 
tion within an initial thread is executed. Thereafter, the 
state of the processor at a selected point within the first 
thread is stored, execution of the first thread is terminat- 
ed and a second thread is selected for execution only 
in response to a level two or higher cache miss, thereby 
minimizing processor delays due to memory latency. 
The validity state of each thread is preferably main- 
tained in order to minimize the likelihood of returning to 
a prior thread for execution before the cache miss has 
been corrected. A least recently executed thread is pref- 
erably selected for execution in the event of a nonvalidity 
indication in association with all remaining threads, in 
anticipation of a change to the valid status of thai thread 
prior to all other threads. A thread switch bit may also 
be utilized to selectively inhibit thread switching where 
execution of a particular thread is deemed necessary. 
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Description 

BACKGROUND OF THE INVENTION 

Technical Field 

The present invention relates in general to an im- 
proved data processing system and in particular to an 
improved high performance multithread data process- 
ing system. Still more particularly the present invention 
relates to a method and system for reducing the impact 
of memory latency in a multithread data processing sys- 
tem. 

Description of the Related Art 

Single tasking operating systems have been avail- 
able for many years within computer systems. In such 
systems, a computer processor executes computer pro- 
grams or program subroutines serially, that is no com- 
puter program or program subroutine can begin to exe- 
cute until the previous computer program or program 
subroutine has terminated. This type of operating sys- 
tem does not make optimum use of the computer proc- 
essor in a case where an executing computer program 
or subroutine must await the occurrence of an external 
event (such as the availability of data or a resource) be- 
cause processor time is wasted. 

This problem has lead to the advent of operating 
systems. Each of the program threads performs a spe- 
cific task. While a computer processor can execute only 
one program thread at a time, if the thread being exe- 
cuted must wait for the occurrence of an external event, 
i.e., the thread becomes "non-dispatchable," execution 
of a non-dispatchable thread is suspended and the com- 
puter processor executes another thread of the same or 
different computer program to optimize utilization of 
processor assets. Multitasking operating systems have 
also been extended to multiprocessor environments 
where threads of the same or different programs can 
execute in parallel on different computer processors. 
While such multitasking operating systems optimize the 
use of one or more processors, they do not permit the 
application program developer to adequately influence 
the scheduling of the execution of threads. 

Previously developed hardware multithread proc- 
essors which maintain multiple states of different pro- 
grams and permit the ability to switch between those 
states quickly typically switch threads at every memory 
reference, cache miss or stall. Memory latencies in mod- 
ern microprocessors are too long and first level on-chip 
cache sizes are generally quite small. For example, in 
an objoct-oricntcd programming environment program 
locality is worse than in traditional environments. Such 
a situation results in increased delays due to increased 
memory access rendering the data processing system 
less cost-effective. 

Existing multithreading techniques describe switch- 



ing threads on a cache miss or a memory reference. A 
primary example of this technique may be reviewed in 
"Sparcle: An Evolutionary Design for Large-Scale Mul- 
tiprocessors," IEEE Micro Volume 13. No. 3. pp. 48-60. 
5 June 1993. As applied in a so-called "RISC* (reduced 
instructions set computing) architecture multiple regis- 
ter sets normally utilized to support function calls are 
modified to maintain multiple threads. Eight overlapping 
register windows are modified to become four non-over- 
^0 lapping register sets, wherein each register set is a re- 
serve for trap and message handling. This system dis- 
closes a thread switch which occurs on each first level 
cache miss that results in a remote memory request. 
While this system represents an advance in the art, 
is modern processor designs often utilize a multiple level 
cache or high speed memory which is attached to the 
processor. The processor system utilizes some welt- 
known algorithm to decide what portion of its main mem- 
ory store will be loaded within each level of cache and 
thus, each time a memory reference occurs which is not 
present within the first level of cache the processor must 
attempt to obtain that memory reference from a second 
or higher level of cache. 

It should thus be apparent that a need exists for an 
improved data processing system which can reduce de- 
lays due to memory latency in a multilevel cache system 
utilized in conjunction with a multithread data process- 
ing system. 

SUMMARY OF THE INVENTION 

It is therefore one object of the present invention to 
provide an improved data processing system. 

It is another object of the present invention to pro- 
vide an improved high performance multithread data 
processor system. 

It is yet another object of the present invention to 
provide an improved method and system for reducing 
delays due to memory latency in a multithread data 
processing system. 

The foregoing objects are achieved as is now de- 
scribed. A method and system are disclosed for en- 
hanced performance multithread operation in a data 
processing system which includes a processor, a main 
memory store and at least two levels of cache memory. 
At least one instruction within an initial thread is execut- 
ed. Thereafter, the state of the processor at a selected 
point within the first thread is stored, execution of the 
first thread is terminated and a second thread is selected 
for execution only in response to a level two or higher 
cache miss, thereby minimizing processor delays due 
to memory latency. The validity state of each thread is 
preferably maintained in order to minimize the likelihood 
of returning to a prior thread for execution before the 
cache miss has been corrected. A least recently execut- 
ed thread is preferably selected for execution in the 
event of a nonvalidity indication in association with all 
remaining threads, in anticipation of a change to the val- 
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id status of that thread prior to all other threads. A thread 
switch bit may also be utilized to selectively inhibit 
thread switching where execution of the current thread 
is deemed necessary. 

The above as well as additional objectives, fea- 
tures, and advantages of the present invention will be- 
come apparent in the following detailed written descrip- 
tion. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the in- 
vention are set forth in the appended claims. The inven- 
tion itself, however, as well as a preferred mode of use, 
further objectives and advantages thereof, will best be 
understood by reference to the following detailed de- 
scription of an illustrative embodiment when read in con- 
junction with the accompanying drawings, wherein: 

Figure 1 is a high level block diagram of a data 
processing system which may be utilized to imple- 
ment the method and system of the present inven- 
tion; 

Figure 2 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates basic oper- 
ation in accordance with the method and system, of 
the present invention; 

Figure 3 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a simple 
prioritized thread management system in accord- 
ance with the method and system of the present in- 
vention; 

Figure 4 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a preemp- 
tive prioritized thread management system in ac- 
cordance with the method and system of the 
present invention; 

Figure 5 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a first thread 
management system in accordance with the meth- 
od and system of the present invention; and 

Figure 6 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a second 
thread management system in accordance with the 
method and system of the present invention. 



DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENT 

With reference now to the figures and in particular 
5 with reference to Figure 1 , there is depicted a high level 
block diagram of a data processing system 10 which 
may be utilized to implement the method and system of 
the present invention. In a preferred embodiment proc- 
essor 12 of system 10 is a single integrated circuit su- 

10 perscalar microprocessor, which may be implemented 
utilizing any well-known superscalar microprocessor 
system such as the PowerPC Microprocessor manufac- 
tured by International Business Machines Corporation 
of Armonk, New York. As will be discussed below data 

*5 processing system 10 preferably includes various units, 
registers, buffers, memories and other sections which 
are all preferably formed by integrated circuitry. As those 
skilled in the art will appreciate data processing system 
10 preferably operates according to reduced instruction 

20 set computing (RISC) techniques. 

As illustrated, data processing system 1 0 preferably 
includes a main memory store 14, a data cache 16 and 
instruction cache 18 which are interconnected utilizing 
various bus connections. Instructions from instruction 

25 cache 18 are preferably output to instruction flow unit 
34 which, in accordance with the method and system of 
the present invention, controls the execution of multiple 
threads by the various subprocessor units within data 
processing system 10. Instruction flow unit 34 selective- 

30 ly outputs instructions to various execution circuitry with- 
in data processing system 10 including branch unit 26. 
fixed point unit 28, load/store unit 30 and floating point 
unit 32. 

In addition to the various execution units depicted 

35 within Figure 1 those skilled in the art will appreciate 
that modern superscalar microprocessor systems often 
include multiple versions of each such execution unit. 
Each of these execution units will have as an input 
source operand information from various registers such 

40 as genera! purpose registers 36 and floating point reg- 
isters 40. Additionally, multiple special purpose registers 
38 may be utilized in accordance with the method and 
system of the present invention to store processor state 
information in response to thread switching. 

45 in a manner well known to those having ordinary 
skill in the art in response to a load instruction load/store 
unit 30 will input information from data cache 16 and 
copy that information to selected buffers for utilization 
by one of the plurality of execution units. Data cache 16 

so j s preferably a small memory which utilizes high speed 
memory devices and which stores data which is consid- 
ered likely to be utilized frequently or in the near future 
by data processing system. In accordance with an im- 
portant feature of the present invention a second level 

55 cache 20 is also provided which, in an inclusive system, 
will include all data stored within data cache 16 and a 
larger amount of data copied from main memory store 
14. Level two cache 20 is preferably a higher speed 
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memory system than main memory store 14 and, by 
storing selected data within level two cache 20 in ac- 
cordance with various well known techniques the mem- 
ory latency which occurs as a result of a reference to 
main memory store 14 can be minimized. 5 

A level two cache/memory interface 22 is also pro- 
vided in accordance with the method and system of the 
present invention. As illustrated, a bus 42 is provided 
between level two cache/memory interface 22 and in- 
struction flow unit 34 to indicate to instruction flow unit io 
34 a miss within level two cache 20. That is, an attempt 
to access data within the system which is not present 
within level two cache 20. Further, a so-called "Transla- 
tion Lookaside Buffer" (TLB) 24 is provided which con- 
tains virtual-to-real address mapping. Although not illus- is 
trated within the present invention various additional 
high level memory mapping buffers may be provided 
such as a Segment Lookaside Buffer (SLB) which will 
operate in a manner similar to that described for trans- 
lation lookaside buffer 24. 20 

Thus, in accordance with an important feature of the 
present invention delays due to memory latency within 
data processing system 10 may be reduced by switch- 
ing between multiple threads in response to the occur- 
rence of an event which indicates long memory latency 25 
may occur. In one embodiment of the system depicted 
within Figure 1 a thread switch will occur, if enabled, in 
response to a level two cache miss on a fetch. That is, 
an attempt by the processor to access the level two 
cache to determine whether or not a memory request 30 
can be satisfied and an indication that the desired data 
or instruction is not present within the level two cache. 
This occurrence is typically processed by causing a 
memory request to be retrieved from main memory store 
14 and the memory latency which occurs during this pe- 3S 
riod triggers, in accordance with the method and system 
of the present invention, a thread switch. In alternate 
embodiments of the present invention a thread switch 
is triggered only in response to the occurrence of those 
events which will take a longer period of time to com- *o 
plete than is required to refill the instruction pipeline (typ- 
ically 5 or 6 cycles). Thus, a thread switch may be trig- 
gered in response to a Translation Lookaside Buffer 
(TLB) Miss or Invalidate, a Segment Lookaside Buffer 
(SLB) Miss or Invalidate, a failed conditional store oper- *s 
ation or other operation which require, on average, a pe- 
riod of time which is longer than the time required for a 
thread switch. By only switching threads in response to 
such events the necessity for increased complexity and 
replication of pipeline latches and additional pipeline so 
states is avoided. 

A thread is accomplished, as described herein, by 
providing a thread state rogistor within a dedicated spe- 
cial purpose register 38. This thread state register pref- 
erably includes an indication of the current thread ss 
number, and indication of whether single-thread or mul- 
ti-thread operation is enabled and a validity indication 
bit for each thread. Thus, if four threads are permitted 



within data processing system 10. seven bits are re- 
quired to indicate this information. Additionally two ex- 
isting special purpose registers are utilized as save-re- 
store registers to store the address of the instruction 
which caused the level two cache miss and store the 
machine state register. 

In accordance with the method and system of the 
present invention level two cache/memory interface 22 
preferably permits multiple outstanding memory re- 
quests. That is, one outstanding memory request per 
thread. Thus, when a first thread is suspended in re- 
sponse to the occurrence of a level two cache miss a 
second thread would be able to access the level two 
cache for data present therein. If the second thread also 
results in a level two cache miss another memory re- 
quest will be issued and thus multiple memory requests 
must be maintained within level two cache/memory in- 
terface 22. Further, in order to minimize so-called 
■thrashing" the method and system of the present inven- 
tion requires that at least a first instruction within each 
thread must complete. Thus, if all threads within the sys- 
tem are awaiting a level two cache miss and the first 
thread is resumed it will not find the required data; how- 
ever, in response to a requirement that at least the first 
instruction must complete this thread will simply wait un- 
til the cache miss has been satisfied. 

Thus, those skilled in the art should appreciate that 
"multithreading,' as defined within the present disclo- 
sure wherein multiple independent threads are execut- 
ing may be accomplished in hardware in accordance 
with the method and system of the present invention 
may be utilized to greatly reduce the delay due to mem- 
ory latency by maintaining the state of multiple threads 
(preferably two or three in accordance with the current 
design) and selectively switching between those 
threads only in response to a second level or higher 
cache miss. 

Referring now to Figure 2 there is depicted a high 
level logic flowchart of a process which may be imple- 
mented within the data processing system of Figure 1 
which illustrates basic operation in accordance with the 
method and system of the present invention. As depict- 
ed, the process begins at block 60 and thereafter passes 
to block 62. Block 62 illustrates the loading of all threads. 
The process then passes to block 64 which depicts the 
setting of the current thread i = 0. Block 66 then depicts 
the execution of thread i until such time as the process 
passes to block 68. Block 68 illustrate the occurrence of 
a level two cache or translation lookaside buffer (TLB) 
miss. In the event no such miss occurs the process re- 
turns, in an iterative fashion, to block 66 to continue to 
execute thread i. 

Roforring again to block 68 in the cvont a lovel two 
cache or translation lookaside buffer miss has occurred, 
the process passes to block 70. Block 70, in accordance 
with an important feature of the present invention, illus- 
trates a determination of whether or not thread switching 
within the system is enabled. Those having ordinary skill 
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in the art will appreciate that in selected instances exe- 
cution of a particular thread will be desirable and thus, 
the method and system of the present invention pro- 
vides a technique whereby the switching between mul- 
tiple threads may be disabled. In the event thread 
switching is not enabled the process passes from block 
70 back to block 66 in an iterative fashion to await the 
satisfaction of the level two cache miss. 

Referring again to block 70, in the event thread 
switching is enabled the process passes to block 72. 
Block 72 illustrates the saving of the state of instruction 
register and the machine state register for thread i uti- 
lizing the special purpose registers (see Figure 1) and 
the process then passes to block 74. Block 74 illustrates 
the changing of the current thread to the next thread by 
incrementing i, accessing the appropriate registers and 
the process then passes to block 76. Block 76 illustrates 
the setting of the thread state for the new current thread 
and the process then returns to block 66 in an iterative 
fashion. 

With reference now to Figure 3 there is depicted a 
high level logic flowchart which illustrates a process 
which may be implemented within the data processing 
system of Figure 1 which depicts a simple prioritized 
thread management system in accordance with the 
method and system of the present invention. As illus- 
trated, this process begins at block 80 and thereafter 
passes to block 82. Block 82 illustrates the loading of all 
threads (0, n - 1 ) and the assignment of an associated 
priority for each thread. The process then passes to 
block 84 which depicts the setting of the current thread 
i equal to the thread having the highest priority. There- 
after, the process passes to block 86. 

Block 86 illustrates the execution of thread i and the 
process then passes to block 88. Block 88 illustrates a 
determination of whether or not a level two cache or 
translation lookaside buffer miss has occurred and if not, 
as above, the process returns to block 86 in an iterative 
fashion to continue to execute thread i. 

Still referring to block 88. in the event a level two 
cache or translation lookaside buffer miss has occurred 
the process passes to block 90. Block 90, as described 
above, illustrates a determination of whether or not 
thread switching is enabled, and if not, the process re- 
turns to block 86 in an iterative fashion. However, in the 
event thread switching is enabled, the process passes 
to block 92. 

Block 92 depicts the saving of the stale of thread i 
and the marking of that thread as "NOT READY. " There- 
after, the process passes to block 94. Block 94 depicts 
the concurrent processing of the switch event and the 
marking of that thread as "READY" when the switch 
event has been resolved. That is, when the level two 
miss has been satisfied by obtaining the desired data 
from main memory store. Continuing, the process pass- 
es to block 96, while processing the switch event as de- 
scribed above, to determine whether or not another 
thread is ready for execution. If so, the process passes 



to block 98 which illustrates the changing of the current 
thread to the thread having the highest priority and a 
"READY" indication. That thread's thread state is then 
set, as depicted within block 102 and the process then 
5 returns to block 86, in an iterative fashion as described 
above. 

Referring again to block 96. in accordance with an 
important feature of the present invention, in the event 
another thread within the system does not indicate 
io "READY" the process passes to block 100. Block 100 
illustrates the changing of the current thread to the 
thread which is least recently run. This occurs as a result 
of a decision that the thread which was least recently 
run is the thread most likely to resolve its switch event 
'5 prior to a subsequent thread and thus, delays due to 
memory latency will be minimized by selection of this 
thread as the current thread. The process then passes 
to block 102 which illustrates the setting of the thread 
state for this selected thread and the process then re- 

20 turns to block 86 in an iterative fashion. 

Referring now to Figure 4 there is depicted a high 
level logic flowchart of a process which may be imple- 
mented within the data processing system of Figure 1 
which depicts a preemptive prioritized thread manage- 
rs ment system in accordance with the method and system 
of the present invention. As illustrated, this process be- 
gins at block 110 and thereafter passes to block 112. 
Block 112 illustrates the loading of all threads (o, n - 1 ) 
and the assignment of an associated priority to each 

30 thread. Thereafter, the process passes to block 114. 
Block 114 illustrates the setting of the current thread i 
equal to the thread having the highest priority. The proc- 
ess then passes to block 116 which depicts the execu- 
tion of thread i. 

35 Next, the process passes to block 118. Block 118 
illustrates a determination of whether or not a level two 
cache or translation lookaside buffer miss has occurred 
and if not, the process passes to block 120. Block 120 
illustrates a determination of whether or not a higher pri- 

40 ority thread has now been indicated as "READY" and if 
not, the process returns to block 116 in an iterative fash- 
ion, to continue to execute thread i. 

Referring again to block 118, in the event a level 
two cache or translation lookaside buffer miss has oc- 

45 curred the process passes to block 122. As described 
above, block 122 illustrates a determination of whether 
or not thread switching is enabled and if not, the process 
returns to block 116 in an iterative fashion. Referring 
again to block 118, in the event a level two cache or 

50 translation lookaside buffer has not occurred, but, as de- 
termined in block 120, a higher priority thread than the 
current thread now indicates "READY" the process also 
passos to block 122. Block 122 thon dotorminos wheth- 
er or not thread switching is enabled and if not, the proc- 

55 ess returns to block 116 in an iterative fashion. 

Still referring to block 122, in the event thread 
switching is enabled, and either a level two cache or 
translation lookaside buffer miss has occurred, or a 
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higher priority thread than the current thread now indi- 
cates "READY" and thread switching is enabled the 
process passes to block 124. Block 134 illustrates the 
saving of the state of thread i and the marking of that 
thread as "NOT READY." Next, the process passes to 
block 126. Block 126 illustrates the concurrent process- 
ing of the switch event, if any, and the marking of the 
previously current thread as "READY" when that switch 
event has completed. Of course, those skilled in the art 
will appreciate that in the event the previously current 
thread was suspended in response to a higher priority 
thread indicating a "READY" state no switch event will 
be processed and the previously current thread will be 
marked "READY" Next, the process passes to block 
128. Block 128 illustrates a determination of whether or 
not another thread is ready and, if the process has oc- 
curred as a result of a level two cache or translation 
lookaside buffer miss a determination of the ready state 
of each thread will be required; however in the event the 
thread switch occurs as a result of a higher priority 
thread indicating a "READY" state then the higher pri- 
ority thread will clearly be available, as determined at 
block 128. Thereafter, the process passes to block 130 
which illustrates the changing of the current thread to 
the thread having the highest priority and an indication 
of "READY." 

Alternately, still referring to block 128, in the event 
the thread switch has occurred as a result of a level two 
cache or translation lookaside buffer miss the process 
passes to block 132 As described above, block 132 il- 
lustrates the changing of the current thread to the thread 
which was least recently run in accordance with the the- 
ory that this thread will be the first thread to achieve a 
"READY" state. Thereafter, the process again passes 
to block 134 which illustrates the setting of the thread 
state for the new current thread and the process then 
returns to block 1 1 6, in an iterative fashion, as described 
above. 

With reference now to Figure 5 there is depicted a 
high level logic flowchart which illustrates a process 
which may be implemented within the data processing 
system of Figure 1 which depicts a first thread manage- 
ment system in accordance with the method and system 
of the present invention. As depicted, this process be- 
gins at block 140 and thereafter passes to block 142. 
Block 142 illustrates the loading of an idle loop for each 
thread (0, n - 1 ). Next, the current thread is set i = 0, as 
depicted in block 144. 

The process then passes to block 146 which illus- 
trates the execution of thread i and the process then 
passes to block 148. Block 148 illustrates a determina- 
tion of the occurrence of a switch event while thread 
switching is enabled. If this occurs the procoss passes 
to block 150 which illustrates the switching of threads 
and the setting of a new current thread. The process 
then returns to block 146, in an iterative fashion. 

Referring again to block 148. in the event a deter- 
mination is made that no switch event has occurred, the 



process passes to block 152. Block 152 illustrates a de- 
termination of whether or not a task within the current 
thread has ended and if not, the process returns to block 
146 in an iterative fashion to continue execution. How- 
5 ever, in the event a task has ended the process passes 
to block 154. Block 154 depicts a determination of 
whether or not another task is ready for execution within 
the current thread and if so, the process passes to block 
156. Block 156 illustrates the loading of the new task for 

io the current thread and this process then returns, in an 
iterative fashion, to block 146 to continue execution of 
the current thread. 

Still referring to block 154, in the event no further 
tasks are ready within the currently executing thread the 

'5 process passes to block 158. Block 158 illustrates the 
starting of the idle loop for thread i and the process then 
returns, in an iterative fashion, to block 146 to await the 
occurrence of one of the enumerated events. 

Finally, referring to Figure 6 there is depicted a high 

20 level logic flowchart of a process which illustrates a proc- 
ess which may be implemented within the data process- 
ing system of Figure 1 which depicts a second thread 
management system in accordance with the method 
and system of the present invention. As illustrated, this 

2S process begins at block 170 and thereafter passes to 
block 172. Block 172 illustrates the loading of an idle 
loop for each thread (0, n - 1). Thereafter, as depicted 
within block 174, the current thread i is set = 0. Next, the 
process passes to block 176. Block 176 illustrates the 

30 marking of the current thread as "VALID" and the mark- 
ing of ail other threads as "NOT VALID." The process 
then passes to block 178. Block 178 illustrates the exe- 
cution of thread i. 

Thereafter, as depicted in block 180 in the event a 

35 determination is made as to whether or not a switch 
event has occurred while switching is enabled. If so, the 
process passes to block 182. Block 182 indicates a de- 
termination of whether or not another thread within the 
system is "VALID." If not, the process returns to block 

40 178, in an iterative fashion, to continue execution of 
thread i. Alternately, in the event another thread is de- 
termined as "VALID" The process passes to block 184. 
Block 184 illustrates the switching of the current thread 
to the new thread chosen from among those threads in- 

*s dicating "VALID" state. The process then returns to 
block 178 to execute the new current thread in the man- 
ner described above. 

Referring again to block 180 in the event a determi- 
nation is made that no switch event has occurred or that 

so switching is not enabled, the process passes to block 
186. Block 186 illustrates a determination of whether or 
not the current task has ended and if so, the process 
passes to block 188. Block 188 illustrates a determina- 
tion of whether or not another task within the current 

55 thread is ready for execution and if so, the process pass- 
es to block 190. Block 190 illustrates the loading of the 
new task for the current thread and the process then 
returns to block 178, in an iterative fashion to continue 
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execution of the current thread. 

Referring again to block 188, in the event a current 
task has ended, as determined at block 186, and a sub- 
sequent task is not ready the process passes to block 
194. Block 194 illustrates a determination of whether or s 
not any other tread within the system indicates "VALID. 
" If not, the process passes to block 196 which illustrates 
the starling of the idle loop for thread i and the process 
then returns to block 178, in an iterative fashion. How- 
ever, in the event another thread within the system indi- io 
cates "VALID* the process passes form block 194 to 
block 200. Block 200 illustrates the marking of the cur- 
rent thread as "NOT VALID" and the process then re- 
turns to block 1 84 to change the current thread to a new 
thread chosen from among the valid threads. is 

Referring again to block 186, in the event the cur- 
rent task is not ended, the process passes to block 192. 
Block 192 illustrates a determination of whether or not 
a new task has become ready and if not, the process 
returns lo block 178. in an iterative fashion to continue 20 
the execution of thread i in the manner described above. 
However, in the event a new task has become ready, as 
determined at block 192, the process passes to block 
198. Block 198 illustrates a determination of whether or 
not any "NOT VALID" threads are present among the 2S 
threads within the system and if not, the process returns 
to block 178, in an iterative fashion, to continue to exe- 
cute thread i. However, in the event a "NOT VALID" 
thread is present within the system the process passes 
to block 202. Block 202 illustrates the selection of one 30 
"NOT VALID" thread, the marking of that thread as "VAL- 
ID" and the loading of the task which is now ready into 
that thread. The process then returns to block 178, in 
an iterative fashion, to continue to execute i. Thereafter, 
in the event a thread switch event occurs, a "VALID" 35 
thread having the new task present therein is ready for 
execution. 
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executing at least one instruction within said 
second thread wherein processing delays due 
to memory access latency are minimized. 

2. The method according to Claim 1 , further including 
the steps of: 

storing an indication of non-validity in associa- 
tion with said first thread in response to said oc- 
currence of said identified event and 

removing said indication of non-validity in as- 
sociation with said first thread following a com- 
pletion of said identified event. 

3. The method according to Claim 1 , wherein said data 
processing system includes a plurality of registers 
and wherein said step of storing a state of said proc- 
essor at a selected point within said first thread 
comprises storing a state of said processor at a se- 
lected point within said first thread within a register 
associated with said first thread. 

4. The method according to anyone of Claim 1 to 3, 
further including the step of determining a validity 
status for each thread within said data processing 
system and selecting a second thread for execution 
in response to said determination, 

selecting a least recently executed thread for 
execution in response to an indication of non- 
va lidrty in association with all remaining threads 
within said data processing system following 
said occurrence of said identified event, and 

selectively inhibiting execution of a subsequent 
thread within said data processing system in re- 
sponse to a state of a switch enable bit. 



Claims 

1. A method for enhanced performance multithread 
operation in a data processing system which in- 
cludes a processor, a main memory store and at 
least two levels of cache memory, said method com- 
prising the steps of: 

executing at least one instruction within a first 
thread; 

thereafter storing a state of said processor at a 
selected point within said first thread, terminat- 
ing execution of said first thread and switching 
execution to a second thread only in response 
to an occurrence of an identified event having 
a delay associated therewith which exceeds an 
amount of time required for a thread switch; and 



40 5. The method according to anyone of Claim 1 to 4, 
further including the step of selecting said second 
thread for execution following said occurrence of 
said identified event in response to a priority indica- 
tion associated with each thread within said data 

4 $ processing system. 

6. The method for enhanced performance multithread 
operation in a data processing system according lo 
anyone of the previous claims 

so 

wherein said switching execution is performed 
in response to a level two or higher cache miss; 
and 

55 further comprises the step of: 

maintaining an address indication for said 
level two or higher cache miss before executing 
said at least one instruction. 



55 
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7. A system for enhanced performance multithread 
operation in a data processing system which in- 
clude a processor, a main memory store and at least 
two levels of cache memory, said system compris- 
ing: 

means for executing at least one instruction 
within a first thread; 

means for thereafter storing a state of said 
processor at a selected point within said first 
thread, terminating execution of said first 
thread and switching execution to a second 
thread only in response to a level two or higher 
cache miss; 

means for maintaining an address indication for 
said level two or higher cache miss; and 

means for executing at least one instruction 
within said second thread wherein processing 
delays due to memory access latency are min- 
imized. 

8. Tho system according to Claim 7, further including: 

means for storing an indication of non-validity 
in association with said first thread in response 
to said level two or higher cache miss, 

means for removing said indication of non-va- 
lidity in association with said first thread follow- 
ing a retrieval from main memory of data or in- 
struction at said maintained address indication 
for said level two or higher cache miss. 

9. The system according to Claim 7, wherein said data 
processing system includes a plurality of registers 
and wherein said means for storing a state of said 
processor at a selected point within said first thread 
comprises storing a state of said processor at a se- 
lected point within said first thread within a register 
associated with said first thread. 

10. The system according to Claim 7, 8 or 9 further in- 
cluding: 



means for selectively inhibiting execution of a 
subsequent thread within said data processing 
system in response to a state of a switch enable 
bit, and 

5 

means for selecting said second thread for ex- 
ecution following said level two or higher cache 
miss in response to a priority indication associ- 
ated with each thread within said data process- 
io ing system. 

11. A computer program product comprising a system 
for enhanced multithread operation in a data 
processing system according to any one of claims 
is 7 to 10. 
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means for determining a validity status for each 
thread within said data processing system and 
selecting a second thread for execution in re- so 
sponse to said determination, 

moans for selecting a least recently executed 
thread for execution in response to an indica- 
tion of non-validity in association with all re- 55 
maining threads within said data processing 
system following said level two or higher cache 
miss, 
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